Goto

Collaborating Authors

 Subject-Specific Education


Dialog-based Language Learning

Neural Information Processing Systems

A long-term goal of machine learning research is to build an intelligent dialog agent. Most research in natural language understanding has focused on learning from fixed training sets of labeled data, with supervision either at the word level (tagging, parsing tasks) or sentence level (question answering, machine translation). This kind of supervision is not realistic of how humans learn, where language is both learned by, and used for, communication. In this work, we study dialog-based language learning, where supervision is given naturally and implicitly in the response of the dialog partner during the conversation. We study this setup in two domains: the bAbI dataset of (Weston et al., 2015) and large-scale question answering from (Dodge et al., 2015). We evaluate a set of baseline learning strategies on these tasks, and show that a novel model incorporating predictive lookahead is a promising approach for learning from a teacher's response. In particular, a surprising result is that it can learn to answer questions correctly without any reward-based supervision at all.



Agent Planning with World Knowledge Model

Neural Information Processing Systems

Imitating humans' mental world knowledge model which provides global prior knowledge before the task and maintains local dynamic knowledge during the task, in this paper, we introduce parametric W orld K nowledge M odel ( WKM) to facilitate agent



Scaling Sign Language Translation

Neural Information Processing Systems

Sign language translation (SL T) addresses the problem of translating information from a sign language in video to a spoken language in text. Existing studies, while showing progress, are often limited to narrow domains and/or few sign languages and struggle with open-domain tasks. In this paper, we push forward the frontier of SL T by scaling pretraining data, model size, and number of translation directions. We perform large-scale SL T pretraining on different data including 1) noisy multilingual Y ouTube SL T data, 2) parallel text corpora, and 3) SL T data augmented by translating video captions to other languages with off-the-shelf machine translation models. We unify different pretraining tasks with task-specific prompts under the encoder-decoder architecture, and initialize the SL T model with pretrained (m/By)T5 models across model sizes. SL T pretraining results on How2Sign and FLEURS-ASL#0 (ASL to 42 spoken languages) demonstrate the significance of data/model scaling and cross-lingual cross-modal transfer, as well as the feasibility of zero-shot SL T. We finetune the pretrained SL T models on 5 downstream open-domain SL T benchmarks covering 5 sign languages. Experiments show substantial quality improvements over the vanilla baselines, surpassing the previous state-of-the-art (SOT A) by wide margins.


Auslan-Daily: Australian Sign Language Translation for Daily Communication and News

Neural Information Processing Systems

Considering different geographic regions generally have their own native sign languages, it is valuable to establish corresponding SL T datasets to support related communication and research. Auslan, as a sign language specific to Australia, still lacks a dedicated large-scale dataset for SL T.